Oxone: A Scalable Solution for Detecting Superior Quality Deltas on Ordered Large XML Documents

نویسندگان

  • Erwin Leonardi
  • Sourav S. Bhowmick
چکیده

Recently, a number of relational-based approaches for detecting the changes to XML data have been proposed to address the scalability problem of main memory-based approaches (e.g., X-Diff, XyDiff). These approaches store the XML documents in the relational database and issue SQL queries (whenever appropriate) to detect the changes. In this paper, we propose a relational-based ordered XML change detection technique (called OXONE) that uses a schemaconscious approach as the underlying storage strategy for XML data. Previous efforts have focused on detecting changes to ordered XML in an schema-oblivious storage environment. Although the schema-oblivious approach produces better result quality compared to XyDiff (a main memory-based ordered XML change detection approach), its performance degrade with increase in data size and is slower than XyDiff for smaller data set. We propose a technique to overcome these limitations. Our experimental results show that OXONE is up to 22 times faster and more scalable than the relational-based schema-oblivious approach. The performances of OXONE and XyDiff (C version) are comparable. However, more importantly, our approach is more scalable compared to XyDiff for larger datasets and has much superior the result quality of deltas than XyDiff.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Xandy: A scalable change detection technique for ordered XML documents using relational databases

Previous work in change detection to XML documents is not suitable for detecting the changes to large XML documents as it requires a lot of memory to keep the two versions of XML documents in the memory. In this article, we take a more conservative yet novel approach of using traditional relational database engines for detecting the changes to large ordered XML documents. To this end, we have i...

متن کامل

Detecting Content Changes on Ordered XML Documents Using Relational Databases

Previous works in change detection on XML focused on detecting changes to text file using ordered and unordered tree model. These approaches are not suitable for detecting changes to large XML document as it requires a lot of memory to keep the two versions of XML documents in the memory. In this paper, we take a more conservative yet novel approach of using traditional relational database engi...

متن کامل

Detecting Changes to Hybrid XML Documents Using Relational Databases

Recent works in XML change detection have focused on detecting changes to ordered or unordered XML documents. However, in real life XML documents may not always be purely ordered or purely unordered. It is indeed possible to have both ordered and unordered nodes in the same XML document (such documents are called hybrid XML). In this paper, we present a technique for detecting the changes to hy...

متن کامل

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

DTD-Diff: A Change Detection Algorithm for DTDs

The DTD of a set of XML documents may change due to many reasons such as changes to the real world events, changes to the user’s requirements, and mistakes in the initial design. In this paper, we present a novel algorithm called DTD-Diff to detect the changes to DTDs that defines the structure of a set of XML documents. Such change detection tool can be useful in several ways such as maintenan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006